How to extract date from a string in Scala

How to extract date from a string in Scala - regex

I have two Strings. The first is like this one:
{"userId":"554555454-45454-54545","start":"20141114T172252.466z","end":"20141228T172252.466z","accounts":[{"date":"20141117T172252.466z","tel":"0049999999999","dec":"a dec","user":"auser"},{"date":"20141118T172252.466z","tel":"004888888888","dec":"another dec","user":"anotheruser"}]}
the second one has the same dates but in a different format. Instead of
20141117T172252.466z
it shows
2014-11-14,17:22:52
I'm trying to extract the dates of the first String and assert that are the same with the dates from the second String. I've tried it with regular expressions but I'm getting an error Illegal repetition. How can I do this?

You can use SimpleDateFormat from java:
import java.text.SimpleDateFormat
import java.util.Date
val s1 = "{\"userId\":\"554555454-45454-54545\",\"start\":\"20141114T172252.466z\"}"
val s2 = "{\"userId\":\"554555454-45454-54545\",\"start\":\"2014-11-14,17:22:52\"}"
val i1 = s1.indexOf("start")
val i2 = s2.indexOf("start")
val str1 = s1.replace("T", "_").substring(i1+8, i1+ 23)
val str2 = s2.substring(i2+8, i2+27)
val date1: Date = new SimpleDateFormat("yyyyMMdd_hhmmss").parse(str1)
val date2: Date = new SimpleDateFormat("yyyy-MM-dd,hh:mm:ss").parse(str2)
val result = date1==date2

Related

Is it possible to concatenate a string value and a list of objects in terraform

Below is my terraform code where I have a list of objects which has 5 values, is it possible to concat each value in the list with the string values
locals{
mylist = ["aaa","bbb","ccc","ddd","eee"]
str1 = "hello"
str2 = "Data"
mergedstring = "${local.str1},local.mylist,${local.str2}"
}
I need the output in the following format
hello,aaa,Data
hello,bbb,Data
hello,ccc,Data
hello,ddd,Data
hello,eee,Data
How can I achieve this?

You can do this as follows:
locals{
mylist = ["aaa","bbb","ccc","ddd","eee"]
str1 = "hello"
str2 = "Data"
mergedstring = join("\n",[for v in local.mylist: "${local.str1},${v},${local.str2}"])
}

How to save string as json in scala spark

I have the raw of string in logs file . I do many filter and other operation after that . I have reached the following problem as below. I need to convert the string into json format . So that i can save it as a single object.
Suppose i have the following data
Val CDataTime = "20191012"
Val LocationId = "12345"
Val SetInstruc = "Comm=Qwe123,Elem=12345,Elem123=Test"
I am trying to create a data frame that contains datetime|location|jsonofinstruction
The Jsonofstring is the json of third Val; I try to split the string first by comma than by equal to sign and loop through by 2 and create a map of value of one and 2 as value. But json not created . Please help here.

You can use scala.util.parsing.json.JSONObject to convert a map to JSON and then to a string.
val df = spark.createDataset(Seq("Comm=Qwe123,Elem=12345,Elem123=Test")).toDF("col3")
val dfWithJson = df.map{ row =>
val insMap = row.getAs[String]("col3").split(",").map{kv =>
val kvArray = kv.split("=")
(kvArray(0),kvArray(1))
}.toMap
val insJson = JSONObject(insMap).toString()
(row.getAs[String]("col3"),insJson)
}.toDF("col3","col4").show()
Result -
+--------------------+--------------------+
| col3| col4|
+--------------------+--------------------+
|Comm=Qwe123,Elem=...|{"Comm" : "Qwe123...|
+--------------------+--------------------+

Is there a way to change dateFormat with java 8 stream?

I want to change date format from "dd/mm/yyyy" to "yyyy/mm/dd" with one line in java8 stream
List<String[]> date = new ArrayList<>();
String[] a= {"12/2/2018","a1","a2"};
String[] b= {"13/3/2018","b1","b2"};
String[] c= {"14/4/2018","c1","c2"};
date.add(a)`
date.add(b);
date.add(c);
I expect the output is
{{"2018/2/12","a1","a2"},{"2018/2/13","b1","b2"},{"2018/2/14","c1","c2"}}

I hope you mean yyyy/MM/dd coz m is for minutes and M for month...
consider a Map from the stream API
public static void main(String[] args) {
List<String[]> date = new ArrayList<>();
String[] a= {"12/2/2018","a1","a2"};
String[] b= {"13/3/2018","b1","b2"};
String[] c= {"14/4/2018","c1","c2"};
date.add(a);
date.add(b);
date.add(c);
List<String[]> even = date.stream().map(
s -> {
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("d/M/yyyy");
LocalDate localDate = LocalDate.parse(s[0], formatter);
DateTimeFormatter formatterNew = DateTimeFormatter.ofPattern("yyyy/MM/dd");
return new String[]{formatterNew.format(localDate), s[1],s[2]};
}
).collect(Collectors.toList());
even.forEach(x-> System.out.println(Arrays.toString(x)));
}
that will print out
[2018/02/12, a1, a2]
[2018/03/13, b1, b2]
[2018/04/14, c1, c2]

You can not do this without iterating over all items.
For your simple case dd/mm/yyyy to yyyy/mm/dd you can just use this:
date.forEach(i -> {
String[] parts = i[0].split("/");
i[0] = parts[2] + "/" + parts[1] + "/" + parts[0];
});
Using java time api you can use this:
DateTimeFormatter toFormat = DateTimeFormatter.ofPattern("yyyy/M/d");
DateTimeFormatter fromFormat = DateTimeFormatter.ofPattern("d/M/yyyy");
date.forEach(i -> i[0] = LocalDate.parse(i[0], fromFormat).format(toFormat));

You can do this by using java 8 streams, but not in one line. Use two date patterns one for input and other for output
DateTimeFormatter inFormat = DateTimeFormatter.ofPattern("dd/M/yyyy");
DateTimeFormatter outFormat = DateTimeFormatter.ofPattern("yyyy/M/dd");
List<String[]> date = new ArrayList<>();
String[] a= {"12/2/2018","a1","a2"};
String[] b= {"13/3/2018","b1","b2"};
String[] c= {"14/4/2018","c1","c2"};
date.add(a);
date.add(b);
date.add(c);
//Since it is String array need lambda expression to update and return
List<String[]> result = date.stream().map(arr->{
arr[0]=LocalDate.parse(arr[0],inFormat).format(outFormat);
return arr;
}).collect(Collectors.toList());

Why use a stream?
DateTimeFormatter originalFormatter = DateTimeFormatter.ofPattern("d/M/u");
DateTimeFormatter wantedFormatter = DateTimeFormatter.ofPattern("u/M/d");
date.forEach(arr -> {
LocalDate ld = LocalDate.parse(arr[0], originalFormatter);
arr[0] = ld.format(wantedFormatter);
});
To inspect the result:
date.forEach(arr -> System.out.println(Arrays.toString(arr)));
Output:
[2018/2/12, a1, a2]
[2018/3/13, b1, b2]
[2018/4/14, c1, c2]
My code (as well as the code in some of the other answers) modifies your original arrays. Please decide if this is OK. If it is, you shouldn’t really use a stream since they are supposed to be free from side-effects. If you do need the original arrays to remain untouched, a stream is fine (using the same formatters as before):
List<String[]> newDateList = date.stream()
.map(arr -> {
LocalDate ld = LocalDate.parse(arr[0], originalFormatter);
String[] newArr = Arrays.copyOf(arr, arr.length);
newArr[0] = ld.format(wantedFormatter);
return newArr;
})
.collect(Collectors.toList());
newDateList.forEach(arr -> System.out.println(Arrays.toString(arr)));
Output is the same as before.

match a timestamp based on regex pattern matching scala

I wrote the following code :
val reg = "([\\d]{4})-([\\d]{2})-([\\d]{2})(T)([\\d]{2}):([\\d]{2})".r
val dataExtraction: String => Map[String, String] = {
string: String => {
string match {
case reg(year, month, day, symbol, hour, minutes) =>
Map(YEAR -> year, MONTH -> month, DAY -> day, HOUR -> hour)
case _ => Map(YEAR -> "", MONTH -> "", DAY -> "", HOUR -> "")
}
}
}
val YEAR = "YEAR"
val MONTH = "MONTH"
val DAY = "DAY"
val HOUR = "HOUR"
This function is supposed to be applied to strings having the following format: 2018-08-22T19:10:53.094Z
When I call the function :
dataExtractions("2018-08-22T19:10:53.094Z")

Your pattern, for all its deficiencies, does work. You just have to unanchor it.
val reg = "([\\d]{4})-([\\d]{2})-([\\d]{2})(T)([\\d]{2}):([\\d]{2})".r.unanchored
. . .
dataExtraction("2018-08-22T19:10:53.094Z")
//res0: Map[String,String] = Map(YEAR -> 2018, MONTH -> 08, DAY -> 22, HOUR -> 19)
But the comment from #CAustin is correct, you could just let the Java LocalDateTime API handle all the heavy lifting.
import java.time.LocalDateTime
import java.time.format.DateTimeFormatter._
val dt = LocalDateTime.parse("2018-08-22T19:10:53.094Z", ISO_DATE_TIME)
Now you have access to all the data without actually saving it to a Map.
dt.getYear //res0: Int = 2018
dt.getMonthValue //res1: Int = 8
dt.getDayOfMonth //res2: Int = 22
dt.getHour //res3: Int = 19
dt.getMinute //res4: Int = 10
dt.getSecond //res5: Int = 53

Your pattern matches only strings that look exactly like yyyy-mm-ddThh:mm, while the one you are testing against has milliseconds and a Z at the end.
You can append .* at the end of your pattern to cover strings that have additional characters at the end.
In addition, let me show you a more idiomatic way of writing your code:
// Create a type for the data instead of using a map.
case class Timestamp(year: Int, month: Int, day: Int, hour: Int, minutes: Int)
// Use triple quotes to avoid extra escaping.
// Don't capture parts that you will not use.
// Add .* at the end to account for milliseconds and timezone.
val reg = """(\d{4})-(\d{2})-(\d{2})T(\d{2}):(\d{2}).*""".r
// Instead of empty strings, use Option to represent a value that can be missing.
// Convert to Int after parsing.
def dataExtraction(str: String): Option[Timestamp] = str match {
case reg(y, m, d, h, min) => Some(Timestamp(y.toInt, m.toInt, d.toInt, h.toInt, min.toInt))
case _ => None
}
// It works!
dataExtraction("2018-08-22T19:10:53.094Z") // => Some(Timestamp(2018,8,22,19,10))

Pattern matching - spark scala RDD

I am new to Spark and Scala coming from R background.After a few transformations of RDD, I get a RDD of type
Description: RDD[(String, Int)]
Now I want to apply a Regular expression on the String RDD and extract substrings from the String and add just substring in a new coloumn.
Input Data :
BMW 1er Model,278
MINI Cooper Model,248
Output I am looking for :
Input | Brand | Series
BMW 1er Model,278, BMW , 1er
MINI Cooper Model ,248 MINI , Cooper
where Brand and Series are newly calculated substrings from String RDD
What I have done so far.
I could achieve this for a String using regular expression, but I cani apply fro all lines.
val brandRegEx = """^.*[Bb][Mm][Ww]+|.[Mm][Ii][Nn][Ii]+.*$""".r //to look for BMW or MINI
Then I can use
brandRegEx.findFirstIn("hello this mini is bmW testing")
But how can I use it for all the lines of RDD and to apply different regular expression to achieve the output as above.
I read about this code snippet, but not sure how to put it altogether.
val brandRegEx = """^.*[Bb][Mm][Ww]+|.[Mm][Ii][Nn][Ii]+.*$""".r
def getBrand(Col4: String) : String = Col4 match {
case brandRegEx(str) =>
case _ => ""
return 'substring
}
Any help would be appreciated !
Thanks

To apply your regex to each item in the RDD, you should use the RDD map function, which transforms each row in the RDD using some function (in this case, a Partial Function in order to extract to two parts of the tuple which makes up each row):
import org.apache.spark.{SparkContext, SparkConf}
object Example extends App {
val sc = new SparkContext(new SparkConf().setMaster("local").setAppName("Example"))
val data = Seq(
("BMW 1er Model",278),
("MINI Cooper Model",248))
val dataRDD = sc.parallelize(data)
val processedRDD = dataRDD.map{
case (inString, inInt) =>
val brandRegEx = """^.*[Bb][Mm][Ww]+|.[Mm][Ii][Nn][Ii]+.*$""".r
val brand = brandRegEx.findFirstIn(inString)
//val seriesRegEx = ...
//val series = seriesRegEx.findFirstIn(inString)
val series = "foo"
(inString, inInt, brand, series)
}
processedRDD.collect().foreach(println)
sc.stop()
}
Note that I think you have some problems in your regular expression, and you also need a regular expression for finding the series. This code outputs:
(BMW 1er Model,278,BMW,foo)
(MINI Cooper Model,248,NOT FOUND,foo)
But if you correct your regexes for your needs, this is how you can apply them to each row.

hi I was just looking for aother question and got this question. The above problem can be done using normal transformations.
val a=sc.parallelize(collection)
a.map{case (x,y)=>(x.split (" ")(0)+" "+x.split(" ")(1))}.collect

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to extract date from a string in Scala - regex

Related

Is it possible to concatenate a string value and a list of objects in terraform

How to save string as json in scala spark

Is there a way to change dateFormat with java 8 stream?

match a timestamp based on regex pattern matching scala

Pattern matching - spark scala RDD

Categories

Resources